cv.nfeaturesLDA {animation}R Documentation

Cross-validation to find the optimum number of features (variables) in LDA

Description

For a classification problem, usually we wish to use as less variables as possible because of difficulties brought by the high dimension. This function has provided an illustration of the process of finding out the optimum number of variables using k-fold cross-validation in a linear discriminant analysis (LDA).

Usage

cv.nfeaturesLDA(data = matrix(rnorm(600), 60), cl = gl(3, 20),
    k = 5, cex.rg = c(0.5, 3), col.av = c("blue", "red"))

Arguments

data a data matrix containg the predictors in columns
cl a factor indicating the classification of the rows of data
k the number of folds
cex.rg the range of the magnification to be used to the points in the plot
col.av the two colors used to respectively denote rates of correct predictions in the i-th fold and the average rates for all k folds

Details

The procedure is like this:

Note that g_{max} is set by ani.options("nmax").

Value

A list containing

accuracy a matrix in which the element in the i-th row and j-th column is the rate of correct predictions based on LDA, i.e. build a LDA model with j variables and predict with data in the i-th fold (the test set)
optimum the optimum number of features based on the cross-validation

Author(s)

Yihui Xie

References

Maindonald J, Braun J (2007). Data Analysis and Graphics Using R - An Example-Based Approach. Cambridge University Press, 2nd edition. pp. 400

http://animation.yihui.name/da:biostat:select_features_via_cv

See Also

kfcv, cv.ani, lda

Examples

op = par(pch = 19, mar = c(3, 3, 0.2, 0.7), mgp = c(1.5, 0.5, 0))
cv.nfeaturesLDA()
par(op)

## Not run: 

# save the animation in HTML pages
oopt = ani.options(ani.height = 480, ani.width = 600, outdir = getwd(),
    interval = 0.5, nmax = 10,
    title = "Cross-validation to find the optimum number of features in LDA",
    description = "This animation has provided an illustration of the process of
    finding out the optimum number of variables using k-fold cross-validation
    in a linear discriminant analysis (LDA).")
ani.start()
par(mar = c(3, 3, 1, 0.5), mgp = c(1.5, 0.5, 0), tcl = -0.3, pch = 19, cex = 1.5)
cv.nfeaturesLDA()
ani.stop()
ani.options(oopt)

## End(Not run)


[Package animation version 1.0-1 Index]